首页> 外文OA文献 >Comparisons of machine learning techniques for detecting malicious webpages
【2h】

Comparisons of machine learning techniques for detecting malicious webpages

机译:用于检测恶意网页的机器学习技术的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper compares machine learning techniques for detecting malicious webpages. The conventional method of detecting malicious webpages is going through the black list and checking whether the webpages are listed. Black list is a list of webpages which are classified as malicious from a user’s point of view. These black lists are created by trusted organizations and volunteers. They are then used by modern web browsers such as Chrome, Firefox, Internet Explorer, etc. However, black list is ineffective because of the frequent-changing nature of webpages, growing numbers of webpages that pose scalability issues and the crawlers’ inability to visit intranet webpages that require computer operators to log in as authenticated users. In this paper therefore alternative and novel approaches are used by applying machine learning algorithms to detect malicious webpages. In this paper three supervised machine learning techniques such as K-Nearest Neighbor, Support Vector Machine and Naive Bayes Classifier, and two unsupervised machine learning techniques such as K-Means and Affinity Propagation are employed. Please note that K-Means and Affinity Propagation have not been applied to detection of malicious webpages by other researchers. All these machine learning techniques have been used to build predictive models to analyze large number of malicious and safe webpages. These webpages were downloaded by a concurrent crawler taking advantage of gevent. The webpages were parsed and various features such as content, URL and screenshot of webpages were extracted to feed into the machine learning models. Computer simulation results have produced an accuracy of up to 98% for the supervised techniques and silhouette coefficient of close to 0.96 for the unsupervised techniques. These predictive models have been applied in a practical context whereby Google Chrome can harness the predictive capabilities of the classifiers that have the advantages of both the lightweight and the heavyweight classifiers.
机译:本文比较了用于检测恶意网页的机器学习技术。检测恶意网页的常规方法是通过黑名单并检查网页是否被列出。黑名单是从用户的角度分类为恶意的网页列表。这些黑名单是由受信任的组织和志愿者创建的。然后,它们被现代网络浏览器(例如Chrome,Firefox,Internet Explorer等)使用。但是,由于网页的频繁更改,构成可伸缩性问题的网页数量不断增加以及爬虫无法访问,因此黑名单无效要求计算机操作员以经过身份验证的用户身份登录的Intranet网页。因此,在本文中,通过应用机器学习算法来检测恶意网页时使用了替代和新颖的方法。在本文中,使用了三种监督式机器学习技术,例如K最近邻,支持向量机和朴素贝叶斯分类器,以及两种非监督式机器学习技术,例如K均值和亲和传播。请注意,其他研究人员尚未将K均值和相似性传播应用于检测恶意网页。所有这些机器学习技术都已用于构建预测模型,以分析大量恶意和安全网页。这些网页是由并发爬网程序利用gevent下载的。解析网页,并提取网页的内容,URL和屏幕快照等各种功能以输入到机器学习模型中。计算机仿真结果显示,受监督技术的精度高达98%,无监督技术的轮廓系数接近0.96。这些预测模型已在实际环境中应用,因此Google Chrome浏览器可以利用分类器的预测功能,这些分类器具有轻量级分类器和重量级分类器的优点。

著录项

  • 作者

    Kazemian, Hassan; Ahmed, S.;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号